The Scalability of FFT on Parallel Computers

نویسندگان

  • Anshul Gupta
  • Vipin Kumar
چکیده

In this paper, we present the scalability analysis of parallel Fast Fourier Transform algorithm on mesh and hypercube connected multicomputers using the isoefficiency metric. The isoefficiency function of an algorithm architecture combination is defined as the rate at which the problem size should grow with the number of processors to maintain a fixed efficiency. On the hypercube architecture, a commonly used parallel FFT algorithm can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. But there is a limit on the achievable efficiency and this limit is determined by the ratio of CPU speed and communication bandwidth of the hypercube channels. Efficiencies higher than this threshold value can be obtained if the problem size is increased very rapidly. If the hardware supports cut-through routing, then this threshold can also be overcome by using an alternate less scalable parallel formulation. The scalability analysis for the mesh connected multicomputers reveals that FFT cannot make efficient use of large-scale mesh architectures unless the bandwidth of the communication channels is increased as a function of the number of processors. We also show that it is more cost-effective to implement the FFT algorithm on a hypercube rather than a mesh despite the fact that large scale meshes are cheaper to construct than large hypercubes. Although the scope of this paper is limited to the Cooley-Tukey FFT algorithm on a few classes of architectures, the methodology can be used to study the performance of various FFT algorithms on a variety of architectures such as SIMD hypercube and mesh architectures and shared memory architecture. ∗This work was supported by IST/SDIO through the Army Research Office grant # 28408-MA-SDI to the University of Minnesota and by the University of Minnesota Army High Performance Computing Research Center under contract # DAAL03-89-C-0038.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Scalability of the 2-d Fft on Parallel Computers

Parallel computers have demonstrated a remarkable potential for achieving high performance at a reasonable cost for many computer vision and image processing (CVIP) applications. A major obstacle to the use of parallel computers is the lack of a universally accepted metric to study the scalability of parallel algorithms and architectures. In this paper, we apply diierent scalability measures to...

متن کامل

Parallel implementation and scalability analysis of 3D Fast Fourier Transform using 2D domain decomposition

3D FFT is computationally intensive and at the same time requires global or collective communication patterns. The efficient implementation of FFT on extreme scale computers is one of the grand challenges in scientific computing. On parallel computers with a distributed memory, different domain decompositions are possible to scale 3D FFT computation. In this paper, we argue that 2D domain decom...

متن کامل

Parallel scaling of Teter’s minimization for Ab Initio calculations

We propose a parallelization scheme for the conjugate gradient method by Teter et. al. and report a detailed analysis of its scalability. We use MPI collective operations exclusively to take advantage of optimized collective implementations with possible hardware support. Our parallel conjugate gradient calculation can be applied in addition to the already implemented parallelism in the applica...

متن کامل

Performance analysis of pure MPI versus MPI+OpenMP for Jacobi Iteration and a 3D FFT on the Cray XT5

Today many high performance computers are collections of shared memory compute nodes with each compute node having one or more multi-core processors. When writing parallel programs for these machines, one can use pure MPI or various hybrid approaches using MPI and OpenMP. Since OpenMP threads are lighter weight than MPI processes, one would expect that hybrid approaches will achieve better perf...

متن کامل

Parallel FFT and Quick-Merge Sort on the Reflective Memory Networked Computers and a Cluster of Workstations

This paper is concerned with parallel FFT and Quick-Merge Sort. They are implemented on computers interconnected by VMIC 5579 reflective memory and a cluster of workstations (PCs) interconnected via Fast Ethernet. Message passing interface (MPI) parallel library is used for communication in a cluster of workstations. An improved parallel FFT is also presented to decrease an execution time in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 4  شماره 

صفحات  -

تاریخ انتشار 1993